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Researchers use community-detection algorithms to reveal large-scale organization in biological 
and social networks, but community detection is useful only if the communities are significant and 
not a result of noisy data. To assess the statistical significance of the network communities, or the 
robustness of the detected structure, one approach is to perturb the network structure by removing 
links and measure how much the communities change. However, perturbing sparse networks is 
challenging because they are inherently sensitive; they shatter easily if links are removed. Here we 
propose a simple method to perturb sparse networks and assess the significance of their communities. 
We generate resampled networks by adding extra links based on local information, then we aggregate 
the information from multiple resampled networks to find a coarse-grained description of significant 
clusters. In addition to testing our method on benchmark networks, we use our method on the sparse 
network of the European Court of Justice (EC J) case law, to detect significant and insignificant areas 
of law. We use our significance analysis to draw a map of the ECJ case law network that reveals 
the relations between the areas of law. 



I. INTRODUCTION 

Network theory provides a good framework for study- 
ing systems composed of many interacting components. 
Recently, researchers have been interested in highlighting 
highly interconnected structures, communities, in biolog- 
ical and social networks [IHS] , because often communities 
correspond to behavioral or functional components. For 
example, in social networks, communities can represent 
friendship groups; on the web, they can represent related 
pages on a specific topic; and in metabolic networks, they 
can represent cycles or other functional groupings. Here 
we show that communities can also capture disciplines 
of judgements in case law systems [5]. However, sim- 
ilar to many real-world networks, the network of ECJ 
case law is sparse because of missing links. The chal- 
lenge in finding significant structures in sparse networks 
is twofold; random noise directly propagates to the com- 
munity results, and communities easily shatter because 
of missing links. To find reliable communities in sparse 
networks with missing links, here we propose a simple 
method based on link prediction. First we show that our 
method performs well on benchmark networks. Then we 
apply our method to the ECJ case law network and gen- 
erate a significance map of EU law. 

Researchers use two main approaches to find statis- 
tically significant communities in networks: approaches 
based on underlying null models and approaches based on 
perturbation techniques. In the null-model approaches, 
communities are significant if the probability of finding 
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them in a random network is lower than a given threshold 
[IOVll2j . This is a solid approach when we are interested 
in how a network was formed. But when researchers are 
interested in highlighting functional aspects of an instan- 
tiated network, such as dynamics on a given network, 
they often use perturbation techniques [I3HI5] . Taking 
this approach, researchers assume random noise in the 
data. When they perform the statistical analysis, they 
repeatedly perturb and cluster the data and then aggre- 
gate the results. Therefore, they can use any cluster- 
ing algorithm and are not restricted to a particular null 
model. But for many sparse networks, the main source 
of error is not random noise in the data, but rather miss- 
ing links with different effects on the clustering. For ex- 
ample, many clustering algorithms identify more clusters 
in sparse networks than in the corresponding networks 
without missing links [THl HZ]. To take this shattering 
effect into account when we perform significance analy- 
sis on sparse networks with missing links, we introduce 
resampling based on link prediction. 

To assess the significance of sparse networks with miss- 
ing links, we combine perturbation techniques and link 
prediction. In practice, we resample sparse networks by 
completing triangles. For undirected networks, complet- 
ing triangles corresponds to the simple and effective link 
prediction method called common neighbor [TS]. After 
explaining our approach in detail, first we show that we 
can recover shattered modules in benchmark networks as 
long as the mixing between modules is moderate and not 
too many links are deleted. Then we apply the method 
to identify significant areas in the network of ECJ case 
law. This network consists of more than 8,000 court cases 
connected by about 32,000 citations over the time period 
1954-2010, clearly a sparse network. We create a signifi- 
cance map and connect several insignificant clusters into 
complete areas of EU law. 
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II. RESAMPLING BASED ON COMPLETING 
TRIANGLES 



To generate resamples of inherently sensitive sparse 
networks, we need a method that efficiently adds extra 
links while preserving the core structure of the network. 
Otherwise, if we apply community detection algorithms 
for partitioning sparse networks with missing links, we 
will often find small shattered modules. We note that 
the problem of aggregating shattered modules by adding 
links is similar to the problem of predicting missing links. 
Missing links prediction methods operate by estimating 
the likelihood of a link between a pair of vertices based 
on their similarity. To evaluate the similarity between 
vertices based on the structural properties of the net- 
work, indices like common neighbors |18| . Jaccard co- 
efficient [in], degree product, shortest paths, and hier- 
archical structure [20] have been proposed and used to 
predict future links on real data ^TD. All similarity in- 
dices use specific assumptions about the positions of the 
missing links that often make them complicated and com- 
putationally expensive to calculate. But these assump- 
tions might not reveal meaningful information in all real 
networks. To significantly analyze networks' communi- 
ties by generating resampled networks, however, we do 
not need to exactly predict missing links; we only need 
to add extra links in a non-destructive way so we can 
measure the robustness of the communities. Therefore, 
we perturb sparse networks with a simple and general 
method: triangle completion. With this methods, we can 
aggregate related scattered communities without making 
specific assumptions about the network. Triangles are 
the smallest unit of communities, and completing them 
strengthens local connections and the important core of 
the communities. As a result, shattered communities 
combine with each other and the community size grows. 
Figure [l] shows an example network in which black links 
indicate existing links in the network and the four inner 
circles correspond to communities in the network. When 
we add links by completing the triangles (dashed lines), 
we aggregate the small communities into two big com- 
munities. 




FIG. 1. Completing triangles followed by clustering 
aggregate shattered communities. Dashed lines show 
different possibilities for completing triangles. 



III. RESULTS AND DISCUSSION 

A. Benchmark networks 

To validate our method, we tested triangle completion 
followed by clustering with the infomap algorithm _22J on 
artificial networks with a built-in community structure. 
The benchmark graphs that we use resemble real-world 
network and was introduced by Lancichinetti et al. |23j . 
The benchmark networks have tunable exponents and 
we use exponent —2 for the degree distribution and ex- 
ponent — 1 for the community size distribution. Further, 
the mixing parameter /i determines the ratio between 
the external degree of a node with respect to its com- 
munity and the total degree of that node. We use this 
framework to generate undirected networks with built-in 
community structures. Figure [2]A schematically shows a 
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FIG. 2. Triangle completion aggregates shattered 

modules. A Original network with 4 communities, B remov- 
ing links leads to small shattered communities, C completing 
triangles in the shattered network integrates small communi- 
ties 

network with 100 nodes and four built-in communities. 
By removing 50% of the links, communities fall apart and 
small modules are detected (Fig. 2|3), But with triangle 
completion, related shattered modules are combined with 
each other (Fig.[2p). 

To quantitatively show that triangle completion per- 
turbs the network in a non-destructive way, we used nor- 
malized mutual information (NMI) to measure the sim- 
ilarity between the community structure of the original 
network and the community structure of the perturbed 
network [Ml 123 . 

Figure |3] shows the result of using the perturbation 
method on benchmark networks with 1000 nodes and two 
different levels of mixing between communities. We gen- 
erated sparse networks with missing links by randomly 
removing 30 and 60 percent of the links in the bench- 
mark networks. The first row shows the result of trian- 
gle completion for low mixing, fi — 0.25, and well-defined 
communities. Low fi, less than 0.5, means that, on av- 
erage, each node has more links going to nodes within 
the same community than to nodes in other communi- 
ties. So when we use our triangle completion method for 
perturbing such networks, we strengthen the structure 
inside the communities more than the structure between 
the communities. Therefore, we amplify the coarse-grain 
structure of the network, and the community structure of 
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the perturbed network will be similar to the community 
structure of the original network, disregarding the num- 
ber of extra links that we added. This reasoning is valid 
both when we perturb the original raw network and when 
we perturb the reduced networks. By adding extra links 
to the reduced networks, shattered and weakly connected 
modules aggregate and module sizes grow. For reference, 
the gray lines in Fig. |3] show that if we randomly add 
links, we completely destroy the community structure of 
the network. 

We use the ratio between the average module size of the 
perturbed network, < Si >iink-added, and the average 
module size of the original network, < 5*^ >originah to 
quantify module growth: 



MS ratio = 



< S,, > 



link— added 



< s,, > 



i -^original 



(1) 



When the built-in community structure is well-defined 
for low /Li, the module size ratio does not exceed one and 
the community structure never collapses. On the other 
hand, in networks with high fi and comparable number 
of links within and between communities, we destroy the 
community structure. That is, when we use triangle com- 
pletion to perturb the network, module sizes grow quickly 
and finally collapse (Fig. [3p,D). We find that /i — 0.5 is 
the threshold at which triangle completion works (Fig. 
|4]). When is higher than 0.5, there are not enough reg- 
ularities in the network to use for non-destructive per- 
turbation. 

By repeatedly completing triangles and clustering link- 
added networks, we can generate bootstrap resamples for 
assessing significant communities in sparse networks with 
missing links. In the next section, we use this resample 
technique to identify significant and insignificant commu- 
nities in the network of ECJ case law. 



B. ECJ case law network 

Case law is continuously evolving and changing over 
time. New cases build on old cases and areas of law 
emerge, vanish, evolve or remain constant over time. Ci- 
tation patterns between cases allow us to track and cap- 
ture the evolution of areas of law. For example, Bom- 
marito II et al. used a dynamic citation network to find 
meaningful clusters in the network of the United Supreme 
Court by means of a distance measure [5^. Here we use 
approximately 32,000 citations between more than 8,000 
court cases (1954-2010) from the Court of Justice of the 
EU to better understand the overall structure of ECJ 
case law. 

The European Court of Justice ensures the correct in- 
terpretation and application of EU law [17]. When it 
comes to the judgments of the ECJ, legal scholars tradi- 
tionally begin by distinguishing cases primarily concern- 
ing substantive issues from cases primarily concerning 
constitutional issues. Substantive issues regard questions 
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FIG. 3. Test of triangle completion on unweighted 
undirected benchmark networks. The panels show the 
similarity between the community structure of the original 
and the perturbed networks, in A and B for low module mix- 
ing and in C and D for high module mixing. Panels A and C 
quantifies the similarity in terms of the normalized mutual in- 
formation (NMI) and panels B and D quantifies the similarity 
in terms of the module size ratio. Filled circles correspond to 
the similarity after link removal. Open symbols correspond 
to the similarity after subsequently adding links by triangle 
completion (colored circles) and random link addition (gray 
squares). Link addition starts at 0, 30, and 60 percents link 
removal. Each point corresponds to an average over 100 net- 
works. 
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FIG. 4. The success of triangle completion depends 
on the module mixing. Similarity between the community 
structure of the original network and the perturbed networks 
as a function of the module mixing parameter fi. In panel A 
the similarity is quantified in terms of the normalized mutual 
information (NMI) and in panel B the similarity is quantified 
in terms of the module size ratio. No links were removed prior 
link addition. Each point corresponds to an average over 100 
networks. 



about specific rights and obligations of individuals. Mem- 
ber States, and EU institutions under EU law. However, 
constitutional issues regard questions about the division 
of power between EU and Member States or the duties 
of Member States to enforce substantive rights. We find 
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that the distinction between substantive and constitu- 
tional issues is supported by the network of ECJ case 
law. In addition to being substantive or constitutional, 
every judgment has also a procedural dimension in the 
sense that the ECJ enjoys jurisdiction over each case on 
one of eleven possible grounds [28]. More information 
about the Court's cases is available on the EU law web- 
site [211 . 

We generated and clustered bootstrap networks from 
the network of ECJ case law to detect significant areas 
of law and to better understand the overall structure. In 
the time-directed network of ECJ case law, each vertex 
corresponds to a court case and an arc from case A to 
case B shows that the newer case A cites the older case 
B, as schematically illustrated in Fig. [5] Similar to many 
other time-directed networks, the network of ECJ case 
law is sparse, as, in the beginning, there were few cases 
to cite. However, because the number of cases increases 
with time, new cases have more options to cite. Complet- 
ing the triangles in the time-directed network of ECJ case 
law corresponds to one of the three situations depicted in 
Fig. [5j In all three situations, the added citation corre- 
sponds to a potential citation that we predict could have 
been considered and materialized in the first place. 




► 

Time 

FIG. 5. Three possibilities for completing triangles in 
the time-directed network of ECJ case law. Given two 
citations between three cases, A being more recent than B, 
which in turn is more recent than C, we can complete triangles 
in three different situations. A If a new case A cites two older 
cases B and C, but B does not cite C, we can make B cite C. 
B If a new case A cites B, and B cites C but A does not cite 
C, we can make A cite C. C If two new cases A and B both 
cite an old case C and the newest case A does not cite B, we 
can make A cite B. 

To show that our perturbation method does not de- 



stroy the core structure of the law network, we would 
like to compare the community structure of the link- 
added network to the community structure of the original 
raw network in terms of NMI. But the actual commu- 
nity structure of the original raw network is not known 
in this case. To overcome this problem, we use the case 
law directory code, the official classification system of the 
court, as our reference point. With this reference point, 
the NMI will be low but when we complete triangles we 
can use the trend of the NMI to validate our method. 
As Fig. [6] shows, perturbing the ECJ case law network 
by completing triangles does not destroy the core struc- 
ture of the network. For example, even when we make 
the network 12 times denser, NMI stays almost constant, 
but at the same time, the module sizes grow as we desire. 
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FIG. 6. Completing triangles in the court case net- 
work generates non-destroyed resample networks. A 

Normalized mutual information (NMI) between the original 
network and the link-added networks as a function of the link 
ratio B Module size ratio between the original network and 
the link-added networks as a function of the link ratio. Each 
point corresponds to an average over 100 runs. 

For a significance analysis of the ECJ case law net- 
work, we first partition the network with a clustering 
algorithm to capture regularities in the raw network. To 
cluster with respect to citation flow between the court 
cases, we use the map equation framework with a gener- 
alized flow model for time-directed networks ^22j. How- 
ever, we emphasize that the significance analysis method 
works for any clustering algorithm. To assess the signifi- 
cance of detected clusters, we generate 100 resample net- 
works by the triangle completion method without making 
any assumption about the underlying distribution of the 
resampled networks. Each resample network has twice 
the number of links as the raw network. Then we parti- 
tion all resampled networks by using the same clustering 
method we used for the raw network. To identify sig- 
nificant clusters, cluster cores, we search for the biggest 
subset of nodes in each cluster that gathered together 
in more than 90% of the resampled networks. We de- 
fine the size of a subset to correspond to the number 
of nodes in the subset and also to the volume of flow 
through the subset, weighted equally. So by finding the 
core of each cluster, we can assess which nodes signifi- 
cantly belong to a cluster and which do not. In addition 
to identifying significant and insignificant nodes within 
each cluster, the resampled networks can provide us with 
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information about which clusters are significantly stand- 
alone and which are probably subsets of other clusters. 
We consider a cluster as significantly stand-alone if its 
core is not partitioned with another cluster in at least 
90% of the resampled networks. That is, two clusters 
are mutually insignificant if their cores are partitioned 
together in more than 10% of the resampled networks. 
In this regard, each cluster could be insignificant with 
more than one other cluster, which means there is not 
enough support from the data for these clusters to exist 
as significantly stand-alone. 

Figure [7] shows the map of the ECJ case law network 
illustrating the 40 top clusters, which we have manually 
named by analyzing which cases are clustered together. 
The size of nodes and links represent the citation flow 
within and between clusters, and we have highlighted 
mutually insignificant clusters by blue shaded areas. 

Several of the identified clusters represent well- 
established areas of law. One example is Equal treat- 
ment (125 cases with 25 cases in the significant core, or 
25/125 cases for short), which aggregates cases concern- 
ing discrimination of individuals based on nationality. 
Less intuitive, but seemingly valid, is the clustering of 
cases concerning the justification of such discrimination 
into a separate cluster, Justifying unequal treatment of 
persons (113/134 cases). Interestingly, completing trian- 
gles aggregates not these two clusters but the latter with 
cases concerning Members States' (MS) justification of 
other violations of substantive rights in the highlighted 
area MS justifying restrictions of basic freedoms in Fig. [7] 
Legal scholars have speculated in a convergence of these 
areas of law without being able to conclusively demon- 
strate this trend. Another example of a structure that 
does not fit squarely into the traditional legal classifi- 
cation is Borderline cases in the internal market (36/74 
cases) . The cluster works as a hub between different areas 
of law, bringing together cases involving several different 
substantive issues, including inter alia equal treatment. 

The significance map in Fig. [7] demonstrates that a 
single clustering of the sparse network is insufficient and 
can be misleading. For example, the map contains two 
clusters representing cases concerning Value Added Tax 
(VAT) (83/113 and 89/101 cases, respectively), even 
though there are no considerable differences between 
cases belonging to the two clusters. The significance anal- 
ysis reveals that the two clusters are not significantly 
stand alone, because the significant cores are clustered 
together in 80 percent of all bootstrap networks. By 
completing triangles and aggregating the clusters, we can 
resolve the problem caused by missing links. The same is 
true for Public service contracts (33/60 and 25/46 cases 
with 83 percent co-clustering of significant cores). The 
same is also true for Infringement proceedings (10/58, 
34/44, and 2/51 cases with 31 percent co-clustering be- 
tween the least co-clustered pair of significant cores) and 
Adoption & review of EU legislation (51/116, 31/43, and 



5/38 cases with 31 percent co-clustering between the least 
co-clustered pair of significant cores). These clusters are 
also interesting because the cases are clustered based on 
the grounds for jurisdiction (procedural clusters), which 
would likely be absent in a more traditional legal catego- 
rization of the case law. 

We also find, somewhat surprising from a legal per- 
spective, that substantive, constitutional, and procedu- 
ral clusters are closely related. For example, we find that 
there is a strong relationship between National procedu- 
ral autonomy (28/77 cases), which aggregates cases con- 
cerning the constitutional issue of procedural adequacy 
of national courts enforcing EU law, and The principle of 
equal pay (74/84 cases), a cluster representing the sub- 
stantive issue of the right of men and women to equal pay 
for equal work. The pattern of interconnected substan- 
tive and constitutional clusters remains on the level of 
aggregated clusters. Completing triangles and aggregat- 
ing mutually insignificant clusters reveal a strong rela- 
tionship between the highlighted constitutional area Ef- 
fective enforcement and the highlighted substantive area 
Equality between men and women. 

These results confirm that combining our resampling 
method with the significance analysis of the preliminary 
clusters can provide reliable aggregated clusters that help 
us better understand the modular organization of a sys- 
tem with missing information. 

IV. CONCLUSIONS 

Using communities as the principal component of com- 
plex systems is reliable only if the communities are sta- 
tistically significant and not the result of noisy or incom- 
plete data. To assess the significance of communities in 
networks with missing links, we have suggested a simple 
approach that perturbs the sparse networks in a con- 
structive way by adding links based on triangle comple- 
tion. The remaining challenge is to estimate the optimal 
number of links to be added, but our benchmark tests 
indicate that results are robust to the number of added 
links. We used our method to identify significantly stand- 
alone communities and aggregate mutually insignificant 
communities in the sparse network of European Court of 
Justice case law. With a significance map of ECJ case 
law, for the first time we can analyze the large-scale or- 
ganization of European law. We have, for example, iden- 
tified structures and relationships that do not fit into 
the traditional legal classification system and empirically 
confirmed trends that legal scholars have only speculated 
in. 
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FIG. 7. Map of ECJ case law. We partitioned 8,200 court case documents with 32,000 citations. Afterwards, we generated 
100 resampled networks using tiie triangle completion method. By clustering these resampled networks and comparing them 
to the clustering of the raw network, we can estimate how much support the data provide in partitioning the raw network. The 
map represents the 40 top modules. Insignificant clusters and their mutually insignificant friends are shaded with blue areas. 
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